Search CORE

38 research outputs found

Knowledge Extraction from Textual Resources through Semantic Web Tools and Advanced Machine Learning Algorithms for Applications in Various Domains

Author: DESSI' DANILO
Publication venue: Università degli Studi di Cagliari
Publication date: 26/02/2020
Field of study

Nowadays there is a tremendous amount of unstructured data, often represented by texts, which is created and stored in variety of forms in many domains such as patients' health records, social networks comments, scientific publications, and so on. This volume of data represents an invaluable source of knowledge, but unfortunately it is challenging its mining for machines. At the same time, novel tools as well as advanced methodologies have been introduced in several domains, improving the efficacy and the efficiency of data-based services. Following this trend, this thesis shows how to parse data from text with Semantic Web based tools, feed data into Machine Learning methodologies, and produce services or resources to facilitate the execution of some tasks. More precisely, the use of Semantic Web technologies powered by Machine Learning algorithms has been investigated in the Healthcare and E-Learning domains through not yet experimented methodologies. Furthermore, this thesis investigates the use of some state-of-the-art tools to move data from texts to graphs for representing the knowledge contained in scientific literature. Finally, the use of a Semantic Web ontology and novel heuristics to detect insights from biological data in form of graph are presented. The thesis contributes to the scientific literature in terms of results and resources. Most of the material presented in this thesis derives from research papers published in international journals or conference proceedings

Archivio istituzionale della ricerca - Università di Cagliari

Understanding class representations: An intrinsic evaluation of zero-shot text classification

Author: Dessi Danilo.
Hoppe F.
Sack H.
Publication venue: CEUR-WS
Publication date: 01/01/2021
Field of study

Frequently, Text Classification is limited by insufficient training data. This problem is addressed by Zero-Shot Classification through the inclusion of external class definitions and then exploiting the relations between classes seen during training and unseen classes (Zero-shot). However, it requires a class embedding space capable of accurately representing the semantic relatedness between classes. This work defines an intrinsic evaluation based on greater-than constraints to provide a better understanding of this relatedness. The results imply that textual embeddings are able to capture more semantics than Knowledge Graph embeddings, but combining both modalities yields the best performance

Archivio istituzionale della ricerca - Università di Cagliari

Towards a representation of temporal data in archival records: Use cases and requirements

Author: Bruns Oleksandra
Dessi Danilo
Sack Harald
Tietz Tabea
Vafaie Mahsa
Publication venue: CEUR-WS
Publication date: 01/01/2021
Field of study

Archival records are essential sources of information for historians and digital humanists to understand history. For modern information systems they are often analysed and integrated into Knowledge Graphs for better access, interoperability and re-use. However, due to restrictions of the representation of RDF predicates temporal data within archival records is a challenge to model. This position paper explains requirements for modeling temporal data in archival records based on running research projects in which archival records are analysed and integrated in Knowledge Graphs for research and exploration

Archivio istituzionale della ricerca - Università di Cagliari

TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study

Author: Dessi Danilo
Helaoui Rim
Kumar Vivek
Recupero Diego Reforgiato
Riboni Daniele
Publication venue
Publication date: 09/06/2021
Field of study

Today, we are seeing an ever-increasing number of clinical notes that contain clinical results, images, and textual descriptions of patient's health state. All these data can be analyzed and employed to cater novel services that can help people and domain experts with their common healthcare tasks. However, many technologies such as Deep Learning and tools like Word Embeddings have started to be investigated only recently, and many challenges remain open when it comes to healthcare domain applications. To address these challenges, we propose the use of Deep Learning and Word Embeddings for identifying sixteen morbidity types within textual descriptions of clinical records. For this purpose, we have used a Deep Learning model based on Bidirectional Long-Short Term Memory (LSTM) layers which can exploit state-of-the-art vector representations of data such as Word Embeddings. We have employed pre-trained Word Embeddings namely GloVe and Word2Vec, and our own Word Embeddings trained on the target domain. Furthermore, we have compared the performances of the deep learning approaches against the traditional tf-idf using Support Vector Machine and Multilayer perceptron (our baselines). From the obtained results it seems that the latter outperforms the combination of Deep Learning approaches using any word embeddings. Our preliminary results indicate that there are specific features that make the dataset biased in favour of traditional machine learning approaches.Comment: 12 pages, 2 figures, 2 tables, SmartPhil 2020-First Workshop on Smart Personal Health Interfaces, Associated to ACM IUI 202

arXiv.org e-Print Archive

KITopen

Ontology modelling for materials science experiments

Author: Akhil Thomas
Christoph Eberl
Danilo Dessi
Harald Sack
Heike Fliegl
Henk Birkholz
Lutz Madler
Markus Niebel
Mehwish Alam
Peter Gumbsch.
Philipp von Hartrott
Publication venue: CEUR-WS
Publication date: 01/01/2021
Field of study

Materials are either enabler or bottleneck for the vast majority of technological innovations. The digitization of materials and processes is mandatory to create live production environments which represent physical entities and their aggregations and thus allow to represent, share, and understand materials changes. However, a common standard formalization for materials knowledge in the form of taxonomies, ontologies, or knowledge graphs has not been achieved yet. This paper sketches the efforts in modelling an ontology prototype to describe Materials Science experiments. It describes what is expected from the ontology by introducing a use case where a process chain driven by the ontology enables the curation and understanding of experiments

Archivio istituzionale della ricerca - Università di Cagliari

TBVAC2020: Advancing tuberculosis vaccines from discovery to clinical development

Author: Aagaard Claus
Aebersold Ruedi
Aguilo Nacho
Andersen Peter
Baird Marc
Bancroft Gregory
Barnier-Quer Christophe
Bastian Max
Britton Warwick
Brosch Roland
Caccamo Nadia
Cardona Pere-Joan
Caron Etienne
Casimiro Danilo
Charneau Pierre
Cho Sang Nae
Choi Ino
Christensen Dennis
Clark Simon
Collin Nicolas
Coppola Mariateresa
Counoupas Claudio
De Libero Gennaro
Dockrell Hazel M.
Drager Nick
Fletcher Helen
Geluk Annemieke
Gilleron Martine
Goletti Delia
Gordon Stephen
Grooten Johan
Guilhot Christophe
Ho Mei Mei
Hogarth Philip
Huygen Kris
Inchauspe Genevieve
Joosten Simone
Kaforou Myrsini
Kallert Stephanie
Kaufmann Stefan H. E.
Lee Hyejon
Levin Michael
Lindenstrom Thomas
Locht Camille
Loxton Andre
Maertzdorf Jeroen
Marie-Ange Demoitie null
Marinova Dessi
Martin Carlos
Mascart Francoise
McShane Helen
Meijgaarden Krista van
Neyrolles Olivier
Nieuwenhuizen Natalie
Nisini Roberto
Ottenhoff Tom
Ottenhoff Tom H. M.
Patel Brij
Peixoto Antonio
Perrie Yvonne
Pinschewer Daniel
Romano Marta
Roordink Danielle
Ruhwald Morten
Sander Peter
Satti Iman
Scriba Thomas
Sharpe Sally
Shin Sung Jae
Siegrist Claire Anne
Sigal Alex
Smith Steven
Spertini FranÃ§ois
Stenger Steffen
Stylianou Elena
Sutherland Jayne
Tanner Rachel
Theung Long Stephane Leung
Thole Jelle
Tima Hermann Giresse
Triccas Jamie
Vergne Isabelle
Verreck Frank
Verreck Frank A. W.
Vilaplana Cris
Vordermeier Martin
Walzl Gerhard
Weiner January
Wilkinson Katalin
Wilkinson Robert
Williams Ann
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2017
Field of study

TBVAC2020 is a research project supported by the Horizon 2020 program of the European Commission (EC). It aims at the discovery and development of novel tuberculosis (TB) vaccines from preclinical research projects to early clinical assessment. The project builds on previous collaborations from 1998 onwards funded through the EC framework programs FP5, FP6, and FP7. It has succeeded in attracting new partners from outstanding laboratories from all over the world, now totaling 40 institutions. Next to the development of novel vaccines, TB biomarker development is also considered an important asset to facilitate rational vaccine selection and development. In addition, TBVAC2020 offers portfolio management that provides selection criteria for entry, gating, and priority settings of novel vaccines at an early developmental stage. The TBVAC2020 consortium coordinated by TBVI facilitates collaboration and early data sharing between partners with the common aim of working toward the development of an effective TB vaccine. Close links with funders and other consortia with shared interests further contribute to this goal

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Palermo

LexTex: a framework to generate lexicons using WordNet word senses in domain specific categories

Author: Danilo Dessi
Diego Reforgiato Recupero
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Lexicons have risen as alternative resources to common supervised methods for classification or regression in different domains (e.g., Sentiment Analysis). These resources (especially lexical) lack of important domain context and it is not possible to tune/edit/improve them depending on new domains and data. With the exponential production of data and annotations witnessed today in several domains, leveraging lexical resources to improve existing lexicons becomes a must. In this work, a novel framework to build lexicons independently from the target domain and from input categories where each text needs to be classified is provided. It employs state-of-the-art Natural Language Processing, Word Sense Disambiguation tools, and techniques to make the method as general as possible. The framework takes as input a heterogeneous collection of annotated text towards a fixed number of categories. Its output is a list of WordNet word senses with weights for each category. We prove the effectiveness of the framework taking as case study the Emotion Detection task by employing the generated lexicons within such a domain. The results prove the effectiveness of proposed framework. Additionally, the paper shows an use case on the human-robot interaction within the Emotion Detection task. Furthermore we applied our methodology in several other domains and compared our approach against common supervised methods (regressors) showing the effectiveness of the generated lexicons. By freely providing the framework we aim at encouraging and disseminating the production of context-aware and domain-specific lexicons in other domains as well

Archivio istituzionale della ricerca - Università di Cagliari

An advanced algorithm for fetal heart rate estimation from non-invasive low electrode density recordings

Author: DESSI' ALESSIA
PANI DANILO
RAFFO LUIGI
Publication venue: 'IOP Publishing'
Publication date: 01/01/2014
Field of study

Non-invasive fetal electrocardiography is still an open research issue. The recent publication of an annotated dataset on Physionet providing four-channel non-invasive abdominal ECG traces promoted an international challenge on the topic. Starting from that dataset, an algorithm for the identification of the fetal QRS complexes from a reduced number of electrodes and without any a priori information about the electrode positioning has been developed, entering into the top ten best-performing open-source algorithms presented at the challenge. In this paper, an improved version of that algorithm is presented and evaluated exploiting the same challenge metrics. It is mainly based on the subtraction of the maternal QRS complexes in every lead, obtained by synchronized averaging of morphologically similar complexes, the filtering of the maternal P and T waves and the enhancement of the fetal QRS through independent component analysis (ICA) applied on the processed signals before a final fetal QRS detection stage. The RR time series of both the mother and the fetus are analyzed to enhance pseudoperiodicity with the aim of correcting wrong annotations. The algorithm has been designed and extensively evaluated on the open dataset A (N=75), and finally evaluated on datasets B (N=100) and C (N=272) to have the mean scores over data not used during the algorithm development. Compared to the results achieved by the previous version of the algorithm, the current version would mark the 5th and 4th position in the final ranking related to the events 1 and 2, reserved to the open-source challenge entries, taking into account both official and unofficial entrants. On dataset A, the algorithm achieves 0.982 median sensitivity and 0.976 median positive predictivity

Archivio istituzionale della ricerca - Università di Cagliari

Deep Learning meets Knowledge Graphs for Scholarly Data Classification

Author: Danilo Dessi'
Fabian Hoppe
Harald Sack
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

The amount of scientific literature continuously grows, which poses an increasing challenge for researchers to manage, find and explore research results. Therefore, the classification of scientific work is widely applied to enable the retrieval, support the search of suitable reviewers during the reviewing process, and in general to organize the existing literature according to a given schema. The automation of this classification process not only simplifies the submission process for authors, but also ensures the coherent assignment of classes. However, especially fine-grained classes and new research fields do not provide sufficient training data to automatize the process. Additionally, given the large number of not mutual exclusive classes, it is often difficult and computationally expensive to train models able to deal with multi-class multi-label settings. To overcome these issues, this work presents a preliminary Deep Learning framework as a solution for multi-label text classification for scholarly papers about Computer Science. The proposed model addresses the issue of insufficient data by utilizing the semantics of classes, which is explicitly provided by latent representations of class labels. This study uses Knowledge Graphs as a source of these required external class definitions by identifying corresponding entities in DBpedia to improve the overall classification

Archivio istituzionale della ricerca - Università di Cagliari

Modeling and extending ecological networks using land similarity

Author: Dessi' Danilo
Fenu Gianni
Pau PIER LUIGI
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Complex network analysis is being applied on topological models of ecological networks, to extrapolate their advanced properties and as part of the activity of land management. Commonly employed methods tend to focus on single target species. This is satisfactory for cognitive analysis, but the limited view provided by these models results in a lack of general information needed for land planning. Similarity scores computed for pairs of nature protection areas are proposed as a building block of a general model to address this shortcoming

Crossref

Archivio istituzionale della ricerca - Università di Cagliari